These are “unscaled” loadings and scores. The prcomp function in R uses scaling by default.
Visualizing the loadings
First four PCAcomponents
Example: USArrests dataset1
| Alabama |
13.2 |
236 |
58 |
21.2 |
| Alaska |
10.0 |
263 |
48 |
44.5 |
| Arizona |
8.1 |
294 |
80 |
31.0 |
| Arkansas |
8.8 |
190 |
50 |
19.5 |
| California |
9.0 |
276 |
91 |
40.6 |
| Colorado |
7.9 |
204 |
78 |
38.7 |
Loadings and scores
PCA loadings; the \(\mathbf{U}\) values
| Murder |
-0.536 |
-0.418 |
0.341 |
0.649 |
| Assault |
-0.583 |
-0.188 |
0.268 |
-0.743 |
| UrbanPop |
-0.278 |
0.873 |
0.378 |
0.134 |
| Rape |
-0.543 |
0.167 |
-0.818 |
0.089 |
Cumulative normalized variances
[1] 0.620060 0.867502 0.956642 1.000000
PCA scores, \(\mathbf{Z} = \mathbf{XU}\)
| Alabama |
-0.975660 |
-1.122001 |
0.439804 |
0.154697 |
| Alaska |
-1.930538 |
-1.062427 |
-2.019500 |
-0.434175 |
| Arizona |
-1.745443 |
0.738460 |
-0.054230 |
-0.826264 |
| Arkansas |
0.139999 |
-1.108542 |
-0.113422 |
-0.180974 |
| California |
-2.498613 |
1.527427 |
-0.592541 |
-0.338559 |
| Colorado |
-1.499341 |
0.977630 |
-1.084002 |
0.001450 |
| Connecticut |
1.344992 |
1.077984 |
0.636793 |
-0.117279 |
| Delaware |
-0.047230 |
0.322089 |
0.711410 |
-0.873113 |
| Florida |
-2.982760 |
-0.038834 |
0.571032 |
-0.095317 |
| Georgia |
-1.622807 |
-1.266088 |
0.339018 |
1.065974 |
Visualizing the feature contributions in the components
Biplots
We can visualize the loadings and the scores together with a biplot.
Non-linear dimension reduction
UMAP, t-SNE, autoencoders (neural networks)
UMAP: penguins revisited
We use just the four numerical predictors
| Adelie |
Torgersen |
39.1 |
18.7 |
181 |
3750 |
male |
2007 |
| Adelie |
Torgersen |
39.5 |
17.4 |
186 |
3800 |
female |
2007 |
| Adelie |
Torgersen |
40.3 |
18.0 |
195 |
3250 |
female |
2007 |
| Adelie |
Torgersen |
NA |
NA |
NA |
NA |
NA |
2007 |
| Adelie |
Torgersen |
36.7 |
19.3 |
193 |
3450 |
female |
2007 |
| Adelie |
Torgersen |
39.3 |
20.6 |
190 |
3650 |
male |
2007 |
Penguins pairplot
UMAP may be overkill here
UMAP
mnist dataset2 - images of handwritten digits (0-9). 784 features (28 x 28 pixels)
What we covered
Why dimension reduction matters
- High-dimensional predictors and rank deficiency
- Variance, multicollinearity, and computational cost
- The curse of dimensionality (analytical + simulation intuition)
Two broad strategies
- Feature selection:
- Feature extraction:
Principal Components Analysis (PCA)
Class experiments/examples
meatspec: PCR with many correlated predictors
wine: classification with few principal components
USArrests: interpreting loadings, scores, and biplots
penguins: PCA vs UMAP for visualization
Nonlinear dimension reduction